Introduction to the theoretical basis of machine learning

Being a qualified development engineer is not a simple matter. It requires a range of capabilities, from development to commissioning to optimization. Each of these capabilities requires sufficient effort and experience. To become a qualified machine learning algorithm engineer (hereinafter referred to as algorithm engineer) is even more difficult, because in addition to mastering the general skills of engineers, you also need to master a network of machine learning algorithms that is not too small. Below we will split the skills required to be a qualified algorithmic engineer and see what skills are required to be considered a qualified algorithmic engineer.

Become a machine learning algorithm engineer, you need these essential skills

Figure 1 Machine learning algorithm engineer skill tree basic development ability

The so-called algorithm engineer needs to be an engineer first, so it is necessary to master some of the capabilities that all development engineers need to master. Some students have some misunderstandings about this point. They think that the so-called algorithm engineers only need to think and design the algorithm, don't care how these algorithms are implemented, and some people will help you to implement the algorithm solution you want. This kind of thinking is wrong. In most positions in most enterprises, the algorithm engineer needs to be responsible for the whole process from algorithm design to algorithm implementation to algorithm online. I have seen some organizations have implemented an organizational structure that separates algorithm design from algorithm implementation. However, under this architecture, it is unclear who is responsible for the algorithm effect. Algorithm designers and algorithm developers have a bitter stomach. The reason is not in the scope of this article, but I hope that everyone remembers that the basic development skills are all that all algorithm engineers need to master.

There are so many skills involved in basic development, and only two important points are selected here for explanation.

unit test

In enterprise applications, a complete solution to a problem usually involves a lot of processes, each of which requires repeated iterations to optimize debugging, how to partition the complex tasks, and ensure the correctness of the overall process? The most practical method is unit testing. Unit testing is not just a simple test skill, it is first and foremost a design capability. Not every code can be unit tested. The premise of unit testing is that the code can be divided into multiple units—that is, modules. After disassembling the project into modules that can be independently developed and tested, plus independent, repeatable unit tests for each module, the correctness of each module can be guaranteed, if the correctness of each module is correct It can be guaranteed that the correctness of the overall process can be guaranteed.

For the development of such a process with frequent changes in algorithm development, module design and unit testing are important guarantees for not digging holes for themselves and others. It is also an important premise that allows you to confidently make various changes to your code.

Logical abstract reuse

Logical abstract reuse can be said to be one of the most important principles of all software development activities. One of the important principles for measuring a programmer's code level is to look at the proportion of repeated code and similar code in his code. Behind a lot of repetitive code or similar code is the laziness of the engineer's thinking, because he feels that copying and pasting or directly copying is the most convenient way. Not only does this look very ugly, but it is also very error-prone, not to mention the difficulty of maintaining it.

There are often many similar logics in algorithm development projects, such as using similar processing methods for multiple features, and many similar processing methods in raw data ETL. If you don't abstract the repetitive logic, the code looks like a line-by-line repeating code, which can be cumbersome to read or maintain.

Probability and statistical basis

Probability and statistics can be said to be one of the cornerstones of the field of machine learning. From a certain perspective, machine learning can be regarded as a systematic way of thinking and cognition of uncertain world based on probabilistic thinking. Learning to look at problems from a probabilistic perspective and describing problems in probabilistic language is one of the most important foundations for deep understanding and proficiency in machine learning techniques.

There are many contentes of probability theory, but they are all embodied by specific distributions. Therefore, the commonly used probability distribution and its various properties are very important for learning probability. For discrete data, Bernoulli distribution, binomial distribution, multinomial distribution, Beta distribution, Dirichlet distribution and Poisson distribution are all things that need to be understood; for offline data, Gaussian distribution and exponential distribution are important. Distribution. These distributions run through various models of machine learning, as well as in the various data of the Internet and the real world. Understand the distribution of data and know what to do with them.

In addition, the theory of hypothesis testing needs to be mastered. In this so-called big data era, the most deceptive is probably the data. Mastering the relevant theories such as hypothesis testing and confidence intervals, can we have the ability to distinguish the authenticity of data conclusions. For example, is there really a difference between the two sets of data, whether the indicator is really improved after a strategy is put on the line, and so on. This kind of problem is very common in practical work. If you don't have the relevant ability, it is equivalent to the big eyes of the big data era.

In terms of statistics, some commonly used parameter estimation methods also need to be mastered, such as maximum likelihood estimation, maximum a posteriori estimation, and EM algorithm. These theories, like the optimization theory, are theories that can be applied to all models and are the basis of the foundation.

Machine learning theory

Although more and more open source toolkits are out of the box, it does not mean that algorithm engineers can ignore the learning and mastery of the basic theory of machine learning. There are two main reasons for this:

Transformer For Charging Pile Of Electric Vehicle

Transformer For Charging Pile Of Electric Vehicle,Charging Pile Electric Vehicle,Transformer For Electric Charging Pile,Transformer For Charging Pile

SANON DOTRANS Co., Ltd. , https://www.sntctransformer.com