翻譯社課程名稱︰天然說話處置懲罰
課程性質︰系內選修
課程教師︰陳信希
開課學院:電資學院
開課系所︰資訊工程學系
考試日期(年代日)︰2016/04/21
測驗時限(分鐘):180 mins
試題 :
01. Machine translation (MT) is one of practical NLP applications. The
development of MT systems has a long history翻譯社 but still has space to
improve. Please address two linguistic phenomena to explain why
MT systems are challenging. (10pts)
02. An NLP system can be implemented in a pipeline翻譯社 including modules of
morphological processing翻譯社 syntactic analysis翻譯社 semantic interpretation
and context analysis. Please use the following news story to describe
the concepts behind. You are asked to mention one task in each module.
(10pts)
這場地震可能影響日相安倍晉三的施政計畫翻譯安倍十八日說,消費睡調漲的
計畫不會改變。
03. Ambiguity is inherent in natural language. Please describe why ambiguity
may happen in each of the following cases. (10pts)
(a) Prepositional phrase attachment.
(b) Noun-noun compound.
(c) Word: bass
04. Why the extraction of multiword expressions is critical for NLP
applications? Please propose a method to check if an extracted multiword
expression meets the non-compositionality criterion翻譯社 (10pts)
05. Mutual information and likelihood ratio are commonly used to find
collocations in a corpus. Please describe the ideas of these two methods.
(10pts)
06. Emoticons are commonly used in social media. They can be regarded as a
special vocabulary in a language. Emoticon understanding is helpful to
understand the utterances in an interaction. Please propose an "emoticon"
embedding approach to represent each emoticons as a vector, and find the
most 5 relevant words to each emoticon. (10pts)
07. To deal with unseen n-grams, smoothing techniques are adopted in
conventional language modeling approach. They are applied to n-grams to
reallocate probability mass from observed n-grams to unobserved n-grams,
producing better estimates for unseen data. Please show a smoothing
technique for the conventional language model, and discuss why neural
network language model (NNLM) can achieve better generalization for unseen
n-grams. (10pts)
08. In HMM learning, we aim at inferring the best model parameters, given a
skeletal model and an observation sequence. The following two equations
are related to compute the state transition probabilities.
Σ_{t=1}^{T-1} ξ_t(i, j)
\hat{a}_{ij} = ---------------------------------------
Σ_{t=1}^{T-1} Σ_{j=1}^{N} ξ_t(i,j)
α_t(i) a_{ij} b_j(o_{t+1}) β_{t+1}(j)
ξ_t(i翻譯社 j) = -----------------------------------------
α_T(q_F)
Please answer the following questions. (10pts)
(a) Intuitively, we can generate all possible paths for the given
observation sequence, and compute total times of a transition which
the observation passes. Which part in the above equations avoids the
generation of all possible paths?
(b) Which part in the above equations is related to prorate count to
estimate the transition probability of a transition?
09. Many NLP problems can be cast as a sequence labelling problem. Part of
speech tagging is a typical example. Given a model and an observation
sequence, we aim at finding the most probable state sequence. Please
explain why this process is called a decoding process. In addition, please
give another application which can be also treated as a sequence labelling
problem. (10pts)
10. What is long-distance dependencies or unbounded dependencies? Why such
kinds of linguistic phenomena are challenging in NLP? (10pts)
11. Part of speech tagging can be formulated in the following two alternatives:
Model 1: \hat{t}_1^n = argmax_{t_1^n} Π_{i=1}^n P(w_i|t_i) P(t_i|t_{i-1})
Model 2: \hat{t}_1^n = argnax_{t_1^n} Π_{i=1}^n P(t_i|w_i, t_{i-1})
Please answer the following questions. (10pts)
(a) Which one is discriminative model?
(b) Which one can introduce more features?
(c) Which one can use Viterbi algorithm to improve the speed?
(d) Which one is derived on the basis of Bayes rule?
12. The following parsing tree is selected from Chinese Treebank 8.0. What NP
and VP rules can be extracted from this parsing tree to form parts of a
treebank grammer? (10pts)
( (IP (IP (NP-SBJ (NN 建築))
| (VP (VC 是)
| | (NP-PRD (CP-APP (IP (NP-SBJ (-NONE- *pro*))
| | | | (VP (VV 開辟)
| | | | | (NP-PN-OBJ (NR 浦東))))
| | | | (DEC 的))
| | | (QP (CD 一)
| | | (CLP (M 項)))
| | | (ADJP (JJ 首要))
| | | (NP (NN 經濟)
| | | (NN 舉動)))))
| (PU 。)
| (IP (NP-SBJ (-NONE- *pro*))
| (VP (DP-TMP (DT 這些)
| | | (CLP (M 年)))
| | (VP (VE 有)
| | (IP-OBJ (NP-SBJ (NP (QP (CD 數百)
| | | | | (CLP (M 家)))
| | | | | (NP (NN 建築)
| | | | | (NN 公司)))
| | | | (PU 、)
| | | | (NP (QP (CD 四千餘)
| | | | | (CLP (M 個)))
| | | | | (NP (NN 建築)
| | | | | (NN 工地))))
| | | (VP (VV 遍布)
| | | | (PP-LOC (P 在)
| | | | | (LCP (NP (DP (DT 這)
| | | | | | (CLP (M 片)))
| | | | | | (NP (NN 熱土)))
| | | | | (LC 上))))))))
| (PU 。)) )
引用自: https://www.ptt.cc/bbs/NTU-Exam/M.1467630306.A.186.html有關翻譯的問題歡迎諮詢天成翻譯社
- May 20 Sat 2017 11:27
[試題] 104下 陳信希 天然語言處置 期中考
close
文章標籤
全站熱搜
留言列表