Joke Collection Website - Public benefit messages - Why is there no server simulation that does not converge?

Why is there no server simulation that does not converge?

1, the reverse propagation chain is broken

That is to say, some variables may be converted into numpy arrays, and they can still participate in the calculation, but they lose the ability of gradient propagation, which makes it impossible to propagate the gradient to the following variables.

2. The setting of learning rate is unreasonable.

If the learning rate is set too high, loss will easily become nan, which will lead to non-convergence of the model; if it is set too low, it will lead to slow learning of the model.

3. The parameters of the neural network layer are not initialized correctly.

Because parameter initialization will affect the training speed of the model.

4. The parameters of the neural network layer are not graded.

If gradient clipping is not carried out, it may lead to gradient explosion, which makes the model unable to propagate the effective gradient back.

5. Insufficient training times

Only after learning enough can the model learn good features. If the training of the model is stopped prematurely, the model will not learn good features.

6. The number of samples in the training batch is too small, which leads to the fluctuation of the loss value, thus causing the illusion that the model does not converge.

Answered on June 26, 2022

Grab the first praise

Enthusiastic netizens

Relevant information recommendation worth seeing.

advertisement

Smart devices for the elderly, good things with great value are easy to sell. Go to Taobao to buy them!

The elderly have a loud voice, big buttons, long standby at home to care for patients, emergency rescuers for the elderly, one-button dialing emergency mobile pager, and one-button calling alarm for the elderly mobile phone at home.

380 yuan

New mobile phone for the elderly, long standby for the elderly, large volume and large screen mobile Unicom telecom old man-machine

188 yuan

AGM H2 military three-proof smart phone 4G Quan netcom anti-fall waterproof old man-machine mobile Unicom telecom big voice big font long standby old man-machine phone

1799 yuan

Newman M560C old man-machine ultra-long standby mobile elderly mobile phone genuine female buttons straight old machine small mobile phone big screen big characters big voice logout function machine telecom elderly mobile phone.

357 yuan

Military three-proof smart phone, full netcom 4G, ultra-long standby, electric tyrants, big characters, big screen, grandfather machines.

936 yuan

Do women eat fish glue? What kind of person is fish glue suitable for? How about eating fish glue?

Recommended information of fish glue worth seeing.

Ningbo yuxin healthy management co., ltd. advertisement

More experts

Causes of non-convergence of the model

Expert 1 online Q&A to 1.

Reply within 5 minutes |10,000 professional interviewees

Ask questions at once

The most beautiful fireworks asked an educational question and made a favorable comment.

Lan Qiuwangzi consulted an education question and made a good comment.

Garlic consulted an education question and made a favorable comment.

188 * * * 8493 consulted an education question, which was well received.

Basketball big picture consulted an education question and made a good comment.

The animal park consulted an educational question and made a favorable comment.

AKA consulted an education question and gave it a favorable comment.

Recommend more exciting content for you-

How to complain about public security case handlers?

Video answer

Lawyer Wang Shujing.

Answered on March 29, 2022

249 likes 2 browsing

Wholesale furniture, Alibaba, factory direct supply source!

1688 advertisement

Suit Custom Wedding Suit Business Suit Wedding Bride Dress Men's Haute Couture Suit Korean version

1 188 yuan 165438 yuan.

buy

Simba.taobao.com advertisement

Jet Li tried to change her nationality to China, but was rejected three times. Now he wants to return to his roots. Will he achieve it?

Jet Li is no stranger to everyone. He has a high reputation at home and abroad. He brought us many classic film and television works, and he relied on it.

Entertainment genius

Answer in September 2022-14

156 browsing

Customized uniforms _ Find customized uniforms products, prices, factories _ Go to Alibaba.

1688 advertisement

How to calculate the salary for small maternity leave 15 days?

Video answer

Guozun lawyer office

Answer -3 1

247 people like 3,860 views.

Huang Xuan showed up at the airport with his girlfriend. Who is his girlfriend?

Video answer

Soothing and kind, Miao Miao 6902

2022- 1 1-09 answer

24 likes 1 comments

Don't miss the tens of millions of goods near Taobao suit customization shop, with complete categories!

Taobao resale advertisement

Hong Kong media photographed Aaron Kwok and his wife Moka Fang visiting a convenience store. What did Moka Fang do to capture the king's heart?

I think she is far superior to other ordinary internet celebrities by virtue of her personal character, especially in the aspects of thrift and wisdom, which is also the reason why she can get it.

Life little cheese theory

Answered on July 26th, 2022.

544 browse

load cargo

all

Check whether the input data is normal. Are there any abnormal data (all zeros, gt is wrong)

Is the data normalized?

Whether it is consistent with the original input standard of the pre-training model.

Is the data preprocessing correct?

Simplify your own problems.

Check your own loss function.

Check for missing input.

Check the customized network layer

Check the setting of the frozen layer.

Check whether the dimension information matches.

Gradient return check

Check the model initialization parameters.

Whether the setting of superparameter is reasonable.

Simplified regularization

Mutual conversion between training mode and testing mode

Visualization (weight, activation function, weight histogram, layer update)

Using different optimizers

Gradient explosion and gradient disappearance (view gradient value)

Adjust the learning rate (the sum weight is one order of magnitude different, the weight is 0. 1, and the learning rate is 0.00 1).

NaN value (reduce the learning rate, observe whether there is a division by 0/ small operation, check the first position where NaN appears, and adjust the activation function)

refer to