Site Loader
Rock Street, San Francisco

 is the semantic similarity between typed-terms
x and y, which can be calculated directly as the cosine similarity between
their concept cluster vectors.

                 (2)

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

 measures
semantic relatedness between typed terms  and . We denote the co-occur concept cluster vector of
typed-term   as which can be retrieved from the compressed
co-occurrence network, and the concept cluster vector of typed-term y as. We observe that the larger the overlapping between
these two concept cluster vectors, the stronger the relatedness between
typed-terms  and y.
Therefore, we calculate  follows:

                (3)

To
determine a valid segmentation, the following heuristics are used.

(a)   
Except for stop words, each word belongs
to one and only one term

(b)  
 Terms are coherent (i.e., terms mutually
reinforce each other).

We
use a graph to represent candidate terms and their relationships. In this work,
we de?ne two types of relations among candidate terms.

Mutual
Exclusion – Candidate terms that contain a same word are mutually exclusive.

Mutual
Reinforcement – Candidate terms that are related mutually reinforce each other
(i.e. they are semantically related).

 

A. Term graph
construction

Based
on the above two types of relations, we construct a term graph TG, where each
node is a candidate term. We associate each node with a weight representing its
coverage of words in the short text excluding stop words. We add an edge
between two candidate terms when they are not mutually exclusive, and set the
edge weight to re?ect the strength of mutual reinforcement as

         (4)

Where
 is a small
positive weight,  is the set of
typed-terms for term x,  is the set of
typed terms for term y, and  is the af?nity
score between typed-terms   and  de?ned in Eq. (1).
Since a term may potentially map to multiple typed-terms, we de?ne the edge
weight between two candidate terms as the maximum af?nity score between their
corresponding typed-terms. When two terms are not related, the edge weight is
set to be slightly larger than 0 (to guarantee the feasibility of a Monte Carlo
algorithm). is the semantic similarity between typed-terms
x and y, which can be calculated directly as the cosine similarity between
their concept cluster vectors.

                 (2)

 measures
semantic relatedness between typed terms  and . We denote the co-occur concept cluster vector of
typed-term   as which can be retrieved from the compressed
co-occurrence network, and the concept cluster vector of typed-term y as. We observe that the larger the overlapping between
these two concept cluster vectors, the stronger the relatedness between
typed-terms  and y.
Therefore, we calculate  follows:

                (3)

To
determine a valid segmentation, the following heuristics are used.

(a)   
Except for stop words, each word belongs
to one and only one term

(b)  
 Terms are coherent (i.e., terms mutually
reinforce each other).

We
use a graph to represent candidate terms and their relationships. In this work,
we de?ne two types of relations among candidate terms.

Mutual
Exclusion – Candidate terms that contain a same word are mutually exclusive.

Mutual
Reinforcement – Candidate terms that are related mutually reinforce each other
(i.e. they are semantically related).

 

A. Term graph
construction

Based
on the above two types of relations, we construct a term graph TG, where each
node is a candidate term. We associate each node with a weight representing its
coverage of words in the short text excluding stop words. We add an edge
between two candidate terms when they are not mutually exclusive, and set the
edge weight to re?ect the strength of mutual reinforcement as

         (4)

Where
 is a small
positive weight,  is the set of
typed-terms for term x,  is the set of
typed terms for term y, and  is the af?nity
score between typed-terms   and  de?ned in Eq. (1).
Since a term may potentially map to multiple typed-terms, we de?ne the edge
weight between two candidate terms as the maximum af?nity score between their
corresponding typed-terms. When two terms are not related, the edge weight is
set to be slightly larger than 0 (to guarantee the feasibility of a Monte Carlo
algorithm).

Post Author: admin

x

Hi!
I'm Dora!

Would you like to get a custom essay? How about receiving a customized one?

Check it out