Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

分类任务,本地可以运行,集群上评估部分出错“PrecisionRecallEvaluator” #2941

Closed
0YuanZhang0 opened this issue Jul 18, 2017 · 2 comments

Comments

@0YuanZhang0
Copy link
Contributor

0YuanZhang0 commented Jul 18, 2017

网络为分类模型,本地运行可以成功,提交到集群上面时,刚开始运行就出错

............................*** Aborted at 1500366428 (unix time) try "date -d @1500366428" if you are using GNU date *** 
*** Aborted at 1500366428 (unix time) try "date -d @1500366428" if you are using GNU date *** 
PC: @ 0x70d722 paddle::PrecisionRecallEvaluator::calcStatsInfo() 
*** SIGFPE (@0x70d722) received by PID 15143 (TID 0x7fe0872d0880) from PID 7395106; stack trace: *** 
@ 0x7fe086ec0160 (unknown) 
@ 0x70d722 paddle::PrecisionRecallEvaluator::calcStatsInfo() 
@ 0x70f0f0 paddle::PrecisionRecallEvaluator::evalImp() 
@ 0x70ecbe paddle::Evaluator::eval() 
@ 0x745f98 paddle::CombinedEvaluator::eval() 
@ 0x7393b4 paddle::MultiGradientMachine::eval() 
@ 0x78d64a paddle::TrainerInternal::trainOneBatch() 
@ 0x787dcf paddle::Trainer::trainOnePass() 
@ 0x78b494 paddle::Trainer::train() 
@ 0x5c02e3 main 
@ 0x7fe08549bbd5 __libc_start_main 
@ 0x5cf9a1 (unknown) 
PC: @ 0x70d722 paddle::PrecisionRecallEvaluator::calcStatsInfo() 
*** SIGFPE (@0x70d722) received by PID 25723 (TID 0x7ff8b3aaf880) from PID 7395106; stack trace: *** 

配置网络如下,集群版本内layers.py中没有seq_reshape_layer层,本地修改了这个文件后加了seq_reshape_layer层后提交到集群:

data_word = data_layer(name="word", size=num_word)
data_postag = data_layer(name="postag", size=num_postag)
data_arc = data_layer(name="arc", size=num_arc)
if not is_predict: 
    data_label = data_layer(name="label", size=num_classes)

word_attr = ParameterAttribute(initial_std=1/8.0, initial_mean=0.0)
tag_attr = ParameterAttribute(initial_std=1/4.0, initial_mean=0.0)
label_attr = ParameterAttribute(initial_std=1/4.0, initial_mean=0.0)

embedding_word = embedding_layer(input=data_word, size=word_dim, param_attr=word_attr)
srl_word = seq_reshape_layer(input=embedding_word, reshape_size=20*word_dim)
embedding_postag = embedding_layer(input=data_postag, size=postag_dim, param_attr=tag_attr)
srl_tag = seq_reshape_layer(input=embedding_postag, reshape_size=20*postag_dim)
embedding_arc = embedding_layer(input=data_arc, size=arc_dim, param_attr=label_attr)
srl_arc = seq_reshape_layer(input=embedding_arc, reshape_size=12*arc_dim)

concat = concat_layer(input=[srl_word, srl_tag, srl_arc], act=LinearActivation())
bias_attr = ParameterAttribute(initial_std=0., l2_rate=0.0001)
w_attr = ParameterAttribute(initial_std=1e-4, initial_mean=0.0)
hidden1 = fc_layer(input=concat, size=hidden_dim, act=ReluActivation(), param_attr=w_attr, bias_attr=bias_attr)
hidden2 = fc_layer(input=hidden1, size=hidden_dim, act=ReluActivation(), param_attr=w_attr, bias_attr=bias_attr)
output = fc_layer(input=hidden2, size=num_classes, act=SoftmaxActivation(), param_attr=w_attr, bias_attr=bias_attr)

if not is_predict: 
    cls_loss = classification_cost(input=output, label=data_label, evaluator=[precision_recall_evaluator, classification_error_evaluator])
    outputs(cls_loss)
else: 
    outputs(output)

任务链接为: http://yq01-idl-gpu-offline62.yq01.baidu.com:8880/output/list/9066

@Superjomn
Copy link
Contributor

这是浮点数异常, SIGFPE

参考 #2563 (comment)

只是一种思路,可以尝试下。

不确定真与 seq_reshape_layer 有关系。

@0YuanZhang0
Copy link
Contributor Author

优化方法换成learning_method=AdamOptimizer()可正常运行了,谢谢,辛苦了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants