Interpretation of machine learning and its algorithm summary in three minutes

1. Development background of machine learning: artificial intelligence

Artificial intelligence (ArTIficial Intelligence, abbreviated as AI) is a new discipline that simulates human consciousness and thinking processes. Today, artificial intelligence has become a reality from the illusory scientific fantasy. Computer scientists have made major breakthroughs in the fields of machine learning and deep learning, the core of artificial intelligence. Machines are given powerful cognitive and predictive capabilities. Looking back at history, in 1997, IBM "Dark Blue" defeated chess champion Kasparov; in 2011, IBM Waston with machine learning capabilities participated in a variety show to win $ 1 million; in 2016, Aplphago using deep learning training Successfully defeated the human world champion. Various events show that machines can also think like humans, and even do better than humans.

At present, artificial intelligence has been widely used in finance, medical care, manufacturing and other industries. Global investment has soared from US $ 589 million in 2012 to more than US $ 5 billion in 2016. McKinsey predicts that the total value of the artificial intelligence application market will reach $ 127 billion by 2025. At the same time, McKinsey conducted an in-depth analysis of the investment in the artificial intelligence market in 2016 and found that nearly 60% of capital mergers and acquisitions are arranged around machine learning. Among them, software-based machine learning startups are more popular with investments than machine-based robotics companies. From 2013 to 2016, the compound annual growth rate of investment in this area reached about 80%. This shows that machine learning has become the main direction of the development of artificial intelligence technology.

Second, the relationship between machine learning and artificial intelligence, deep learning

Before introducing machine learning, it is necessary to sort out the relationship between artificial intelligence, machine learning and deep learning. The most common divisions in the industry are:

Artificial intelligence uses a completely different working mode from traditional computer systems. It can read massive amounts of "big data" based on a common learning strategy and discover rules, connections, and insights from it. Therefore, artificial intelligence can automatically adjust based on new data. No need to reset the program.

Machine learning is the core technology of artificial intelligence research. Under the support of big data, various algorithms allow machines to perform deep statistical analysis of data for "self-study"; using machine learning, artificial intelligence systems have obtained inductive reasoning and decision-making Ability; and deep learning pushes this ability to a higher level.

Deep learning is a type of machine learning algorithm, which belongs to the artificial neural network system. Now the best performing machine learning in many application fields is designed based on neural networks that imitate the structure of the human brain. These computer systems can be completely autonomous. Learn, discover, and apply rules locally. Compared with other methods, it performs better in solving more complex problems. Deep learning is a way to help machines achieve "independent thinking".

In short, artificial intelligence is an important driving force for social development, and machine learning, especially deep learning technology is the core of the development of artificial intelligence, the relationship between the three of them is contained and contained. As shown in Figure 1.

Figure 1 The relationship between artificial intelligence, machine learning, and deep learning

3. Machine learning: an efficient way to achieve artificial intelligence

In a broad sense, machine learning is a method that can give machine learning the ability to perform functions that direct programming cannot. But in a practical sense, machine learning is a method that uses data to train a model and then uses the model to predict. Some foreign scholars have defined machine learning in much the same way. Some scholars believe that machine learning is the study of computer algorithms that can be automatically improved by experience; some scholars believe that machine learning refers to the use of data or past experience to optimize computer programs Performance standards. It can be seen that machine learning is the study of improving algorithms through experience or data. Through the algorithm, the machine learns laws from a large amount of historical data, obtains a certain model and uses this model to predict the future. During the learning process, the machine processes the data The more, the more accurate the prediction result.

Machine learning has a very important position in the research of artificial intelligence. It is the core of artificial intelligence and the fundamental way to make computers intelligent, and its applications are in various fields of artificial intelligence. From the 1950s, people began to study machine learning, from the initial method research based on neuron model and function approximation theory, to the generation of rule learning and decision tree learning based on symbolic calculus, and the subsequent recognition From the introduction of concepts such as induction, interpretation, and analogy in cognitive psychology, to the rise of the latest computational learning theory and statistical learning, machine learning has always played a leading role in the practical application of related disciplines. Many achievements have been made and many research directions have been differentiated, mainly including symbolic learning, connection learning and statistical learning.

(1) The development of machine learning

The development process of machine learning can be roughly divided into the following four stages:

1. Mid-50s to mid-1960s

In this period, the study is "no knowledge" learning, that is, "ignorance" learning; its research goals are various self-organizing systems and adaptive systems; its main research method is to continuously modify the control parameters of the system to improve the system Execution ability does not involve knowledge related to specific tasks. The theoretical basis guiding the research at this stage is the neural network model that has been studied as early as the 1940s. With the emergence and development of electronic computers, the realization of machine learning becomes possible. The research at this stage led to the birth of the new science of pattern recognition, and at the same time formed two important methods of machine learning, namely discriminant function method and evolutionary learning. Samuel's chess program is a typical example of using the discriminant function method. However, this perception-free learning system without knowledge has great limitations. Whether it is a neural model, evolutionary learning or discriminant function method, the learning results obtained are very limited, far from meeting people's expectations of machine learning systems. During this period, China has developed a digital recognition learning machine.

2. Mid-1960s to mid-1970s

The research goal of this stage is to simulate the concept learning process of human beings, and use the logical structure or graph structure as the internal description of the machine. Machines can use symbols to describe concepts (symbol concept acquisition) and make various assumptions about learning concepts. The representative work in this stage includes Winston's structural learning system and Hayes Roth's logic-based inductive learning system. Although this type of learning system has achieved great success, it can only learn a single concept, and it has not been put into practical use. In addition, the neural network learning machine turned into a low tide due to the theoretical flaws failing to achieve the expected results. Therefore, it disappoints those who have too much hope for the progress of machine learning. They called this period "the dark period".

3. Mid-70s to mid-80s

During this period, people expanded from learning a single concept to learning multiple concepts, exploring different learning strategies and various learning methods. The machine learning process is generally built on a large-scale knowledge base to achieve knowledge reinforcement learning. It is particularly encouraging that this stage has begun to integrate the learning system with various applications and has achieved great success, promoting the development of machine learning. After the emergence of the first expert learning system, the example reduction learning system became the mainstream of research, and automatic knowledge acquisition became the applied research goal of machine learning. In 1980, the first International Conference on Machine Learning was held at Carnegie Mellon University (CMU) in the United States, marking the rise of machine learning research in the world. After that, machine induction learning entered the application. In 1986, the international magazine "Machine Learning" (Machine Learning) was launched, ushering in a new era of vigorous development of machine learning. In the late 1970s, the Institute of Automation, Chinese Academy of Sciences conducted mass spectrometry analysis and pattern grammar inference research, which showed that China's machine learning research has been restored. After Simon came to China to spread the flame of machine learning in 1980, a new situation emerged in China's machine learning research.

4. The latest stage of machine learning began in 1986

On the one hand, due to the re-emergence of neural network research, the study of connection mechanism learning methods is in the ascendant. The research of machine learning has reached a new climax all over the world. The basic theory of machine learning and the research of integrated systems have been strengthened and developed. On the other hand, experimental research and applied research have received unprecedented attention, and machine learning has a stronger research method and environment. As a result, there have been hundreds of controversies such as symbol learning, neural network learning, evolutionary learning, and reinforcement learning based on behavioralism (acTIonism).

Figure 2 The development of machine learning

(2) Structural model of machine learning

The essence of machine learning is algorithms. Algorithms are a series of instructions for solving problems. The algorithms developed by programmers to guide computers in new tasks are the foundation of the advanced digital world we see today. Computer algorithms organize large amounts of data into information and services based on certain instructions and rules. Machine learning issues instructions to the computer, allowing the computer to learn from the data without requiring programmers to make new step-by-step instructions.

The basic process of machine learning is to provide training data for learning algorithms. Then, the learning algorithm generates a new set of rules based on the inference of the data. This is essentially generating a new algorithm, called a machine learning model. By using different training data, the same learning algorithm can generate different models. Inferring new instructions from data is a core advantage of machine learning. It also highlights the key role of data: the more data available to train the algorithm, the more the algorithm learns. In fact, many of the latest advances in AI are not due to radical innovations in learning algorithms, but now accumulate a large amount of available data.

Figure 3 Structural model of machine learning

(3) How machine learning works

1. Select data: first divide the original data into three groups: training data, verification data and test data;

2. Data modeling: Use training data to build models that use relevant features;

3. Verification model: Use verification data to input into the data model that has been constructed;

4. Test model: Use test data to check the performance of the verified model;

5. Use model: Use a fully trained model to make predictions on new data;

6. Tuning model: Use more data, different features or adjusted parameters to improve the performance of the algorithm.

Figure 4 How machine learning works

(4) The key cornerstones of machine learning development:

(1) Mass data: The energy source of artificial intelligence is a stable data flow. Machine learning can "train" itself with massive amounts of data before it can develop new rules to accomplish increasingly complex tasks. There are currently more than 3 billion people online, and about 17 billion connected devices or sensors, which generate a lot of data, and the reduced cost of data storage makes these data easy to use.

(2) Super computing: powerful computers and remote processing capabilities connected through the Internet make it possible to process massive amounts of data with machine learning technology. According to a certain media, the reason why ALPHGO can achieve a historic victory in the duel against Li Shishi This is inseparable from its hardware configuration of 1920 CPU and 280 GPU super computing system, which shows that computing power is crucial to machine learning.

(3) Excellent algorithms: In machine learning, learning algorithms (learning algorithms) create rules that allow computers to learn from data to infer new instructions (algorithm models), which is also the core advantage of machine learning. New machine learning technologies, especially layered neural networks, also known as "deep learning", have inspired new services and stimulated investment and research in other areas of artificial intelligence.

Figure 5 Key cornerstones of machine learning

(5) Classification of machine learning algorithms

Machine learning can usually be divided into three categories based on the different forms of learning:

1. Supervised Learning
The learning algorithm is provided with labeled data and the required output. For each input, the learner is provided with a response goal. Supervised learning is mainly used to quickly and efficiently teach the existing knowledge of AI, and is used to solve the problems of classification and regression. Common algorithms are:

(1) Decision Trees: Decision trees can be regarded as a tree-like prediction model. It classifies instances by arranging the instances from the root node to a leaf node. The leaf node is the classification to which the instance belongs. The core problem of decision tree is to choose split attributes and pruning of decision tree. A decision tree is a decision support tool that uses tree diagrams or models to represent decisions and their possible consequences, including the impact of random events, resource consumption, and usage. The decision tree used to analyze and judge the loan intention is shown in the figure. From a business perspective, it is often used for rule-based credit assessment and horse racing result prediction.

Figure 6 Decision tree

(2) Adaboost algorithm: This is an iterative algorithm whose core idea is to train different classifiers (weak classifiers) against the same training set, and then combine these weak classifiers to form a stronger final classifier (Strong classifier). The algorithm itself is realized by changing the data distribution. It determines the weight of each sample according to whether the classification of each sample in each training set is correct and the accuracy of the last overall classification. The new data with modified weights is sent to the lower classifier for training, and then the classifier obtained by each training is fused as the final decision classifier. The AdaBoost algorithm mainly solves: two types of problems, multi-class single-label problems, multi-class multi-label problems, large-class single-label problems and regression problems; advantages: the learning accuracy is significantly increased, and there will be no over-fitting problems, AdaBoost algorithm technology Commonly used in the field of face recognition and target tracking.

Figure 7 Adaboost

(3) Artificial neural network (Artificial Neural Network-ANN) algorithm: Artificial neural network is a nonlinear, adaptive information processing system composed of a large number of processing units interconnected. It is proposed on the basis of the research results of modern neuroscience, and attempts to process information by simulating the processing and memory of the neural network of the brain. Artificial neural network is a parallel distributed system. It uses a completely different mechanism from traditional artificial intelligence and information processing technology. It overcomes the shortcomings of traditional artificial intelligence based on logical symbols in processing intuitive and unstructured information. Features of self-organization and real-time learning.

Figure 8 Artificial Neural Network

(4) SVM (Support Vector Machine): The SVM method is the support vector machine algorithm, which was proposed by Vapnik et al in 1995 and has relatively excellent performance indicators. This method is a machine learning method based on statistical learning theory. SVM is a dichotomy algorithm. Suppose there is a set of points in N-dimensional space, including two types. SVM generates a (N-1) -dimensional hyperplane and divides these points into two groups. For example, you have some points on the paper, these points are linearly separated. The SVM will find a straight line, divide these points into two categories, and will stay away from these points as much as possible. In terms of scale, SVM (including properly adjusted) solves some very big problems: advertising, human gene splice site recognition, image-based gender detection, large-scale picture classification, suitable for news classification, handwriting recognition and other applications .

Figure 9 Support Vector Machine Algorithm

(5) Naive Bayesian (Naive Bayesian): Bayesian method is a pattern classification method in the case of known prior probability and class conditional probability, the classification result of the sample to be divided depends on the samples in various domains The whole. The naive Bayes classifier is based on the assumption of strong independence of applying Bayes' theorem to the relationship between features. Advantages: It is still effective when there is less data, and can handle multi-category problems. Disadvantages: more sensitive to the input data preparation method. Applicable data type: nominal data. Examples of real-life applications: secondary filtering of email spam, classification of article attributes, analysis of the content meaning and face recognition of text expressions, sentiment analysis, and consumer classification.

Figure 10 Naive Bayes algorithm

(6) K-Nearest Neighbors (KNN): This is a classification algorithm, and its core idea is that if a sample is in the feature space, most of the k most adjacent samples belong to a certain category, Then the sample also belongs to this category and has the characteristics of the sample in this category. This method determines the classification to which the samples to be classified belong based on the classification of the nearest sample or samples in determining the classification decision. The kNN method is only relevant to a very small number of adjacent samples when making category decisions. Since the kNN method mainly depends on the limited neighboring samples around, rather than the method of discriminating the class domain to determine the category, the kNN method is more preferable than other methods for the sample set to be divided or overlapped more than the class domain. As suitable.

Figure 11 K-nearest neighbor algorithm

(7) Logistic regression (LogisTIc Regression): This is a classification algorithm, mainly used for binary classification problems. Logistic regression is a very powerful statistical method that can build data with one or more explanatory variables into a binomial type model, estimate the probability by using the logistic function of the cumulative logistic distribution, measure the categorical dependent variable and one or The relationship between multiple independent variables. Logistic regression is a nonlinear regression model. Compared to linear regression, it has an additional sigmoid function (or Logistic function). Generally, the uses of regression in real life are as follows: credit evaluation, measuring marketing success, predicting the revenue of a product, whether an earthquake will occur on a specific day, road traffic flow analysis, and mail filtering.

Figure 12 Logistic regression algorithm

(8) Random Forest algorithm: Random Forest algorithm can be used to deal with regression, classification, clustering, and survival analysis. When it is used for classification or regression, its main idea is to resample by self-help method. Generate many tree regressors or classifiers. In machine learning, a random forest is a classifier that contains multiple decision trees, and its output category is determined by the mode of the category output by individual trees. It is often used for user churn analysis and risk assessment.

Figure 13 Random Forest Algorithm

(9) Linear regression (Linear Regression): This is a statistical analysis method that uses regression analysis in mathematical statistics to determine the interdependent quantitative relationship between two or more variables. It is widely used. Linear regression is the first type in regression analysis that has been rigorously studied and widely used in practical applications. This is because a model that depends linearly on its unknown parameters is easier to fit than a model that depends nonlinearly on its position parameters, and the estimated statistical properties are also easier to determine.

Figure 14 Linear regression algorithm

2. Unsupervised Learning

The data provided to the learning algorithm is unlabeled, and requires the algorithm to identify the patterns in the input data, mainly to build a model, use it to try to interpret the input data, and use it for the next input. In reality, many data sets often have a large number of unlabeled samples, but labeled samples are relatively few. If it is directly deprecated, it will largely result in low model accuracy. The solution to this situation is often to combine labeled samples and to change unlabeled samples into pseudo-labeled samples by estimation, so unsupervised learning is more difficult to master than supervised learning. Mainly used to solve clustering and dimensionality reduction problems, common algorithms are:

(1) Clustering algorithm: the task of grouping a group of objects so that objects in the same group are more similar to each other than objects in other groups. Common clustering algorithms include:

â‘ K-means algorithm: This is the representative of the typical prototype function clustering method based on the target. It is a certain distance from the data point to the prototype as the optimized target function. The method of seeking the extreme value of the function is used to obtain the adjustment rules of iterative operation. The advantage is that the algorithm is fast and simple enough, and if the preprocessing data and feature engineering are very effective, then the clustering algorithm will have extremely high flexibility. The disadvantage is that the algorithm needs to specify the number of clusters, and the choice of K value is usually not so easy to determine. In addition, if the real clusters in the training data are not spherical, then K-means clustering will result in some poorer clusters.

Figure 15 K-means algorithm

â‘¡Expectation Maximisation (EM): This is an iterative algorithm used for maximum likelihood estimation or maximum posterior probability estimation of probability parameter models containing latent variables. The main purpose of the EM algorithm is to provide a simple iterative algorithm to calculate the posterior density function. Its biggest advantage is simplicity and stability, but it is easy to fall into the local optimum.

Figure 16 EM algorithm

â‘¢Affinity Propagation clustering: AP clustering algorithm is a relatively new clustering algorithm. The clustering algorithm determines clusters based on graph distances between two sample points. The clusters adopting this clustering method have smaller and unequal sizes. Advantages: The algorithm does not need to indicate a clear number of clusters. Disadvantages: The main disadvantage of the AP clustering algorithm is that the training speed is relatively slow and requires a lot of memory, so it is difficult to expand to large data sets. In addition, the algorithm also assumes that the potential clusters are spherical.

â‘£ Hierarchical Clustering (Hierarchical Clustering): Hierarchical clustering is a series of clustering algorithms based on the following concepts: It is a hierarchical decomposition of the data set according to a certain method until a certain condition is met. According to the different classification principles, it can be divided into two methods: cohesion and splitting. Advantages: The main advantage of hierarchical clustering is that clusters no longer need to be assumed to be spherical. It can also be extended to large data sets. Disadvantages: a bit like K-means clustering, the algorithm needs to set the number of clusters.

Figure 17 Hierarchical clustering algorithm

⑤DBSCAN: This is a more representative density-based clustering algorithm. Different from partitioning and hierarchical clustering methods, it defines clusters as the largest set of points connected by density, can divide regions with sufficiently high density into clusters, and can find clusters of arbitrary shapes in the spatial database of noise. It forms a cluster of dense areas of sample points. Advantages: DBSCAN does not need to assume that the cluster is spherical, and its performance is scalable. In addition, it does not require every point to be assigned to a cluster, which reduces the abnormal data of the cluster. Disadvantages: The user must adjust the two "epsilon" and "min_sample" hyperparameters that define the cluster density. DBSCAN is very sensitive to these hyperparameters.

Figure 18 DBSCAN algorithm

(2) Dimensionality reduction algorithm: its main feature is to reduce the data from high dimensionality to low dimensionality, and retain the information of the data to the greatest extent. The representative algorithm is:

①The main representative is the principal component analysis algorithm (PCA algorithm): principal component analysis is also called principal component analysis, which aims to use the idea of ​​dimensionality reduction to convert multiple indicators into a few comprehensive indicators (ie, principal components), each of which is The components can reflect most of the original variable information, and the information contained does not overlap with each other. This method combines complex factors into several principal components while introducing multiple variables, which simplifies the problem and at the same time obtains more scientific and effective data information.

Figure 19 PCA algorithm

â‘¡ Locally linear embedding LLE LLE dimensionality reduction algorithm: a nonlinear dimensionality reduction algorithm, which can make the data after dimensionality reduction maintain the original manifold structure. The algorithm is an optimization method for the dimension of the non-linear signal feature vector. This dimension optimization is not just a simple reduction in number, but a signal in a high-dimensional space while keeping the original data properties unchanged. Mapped to low-dimensional space, that is, the second extraction of eigenvalues.

Figure 20 LLE dimensionality reduction algorithm

3. Reinforcement learning (Reinforcement Learning)
The algorithm interacts with the dynamic environment, takes the feedback of the environment as input, and selects the optimal action that can achieve its goal through learning. The mathematical principle behind reinforcement learning is slightly different from supervised / unsupervised learning. Supervised / unsupervised learning uses more statistics, while reinforcement learning combines discrete mathematics and stochastic processes. Common algorithms are:

① TD (λ) algorithm: TD (temporal differenee) learning is one of the most important learning techniques in reinforcement learning technology. TD learning is a combination of Monte Carlo ideas and dynamic programming ideas, that is, on the one hand, the TD algorithm can be learned directly from the agent experience without the need for a system model; on the other hand, the TD algorithm is similar to dynamic programming, using the estimated value function. Iterate.

Figure 21 TD (λ) algorithm

â‘¡Q_learning algorithm: Q_learning learning is a model-independent reinforcement learning algorithm, also known as off-policy TD (off-policy TD). Unlike the TD algorithm, the reward of the state_action pair and Q (s, a) are used as estimation functions in Q_learning iterations. Each agent needs to examine each behavior during each learning iteration to ensure the convergence of the learning process.

Figure 22 Q_learning algorithm

(6) An example of the machine learning process

The so-called machine learning process refers to observing a set of n sample data and predicting the nature of unknown data based on these data. Then, given the data set (so-called big data) and specific problems, the problem is generally solved. The steps can be summarized as follows:

1. Data abstraction

Abstract data sets and specific problems into mathematical language, expressed with appropriate mathematical symbols. This is naturally to facilitate the formulation and solution of problems, and it is also more intuitive.

2. Set performance metrics

Machine learning is an algorithm that generates a model. Generally speaking, models have errors. If the model is too good, consider some characteristics of the training sample as the general nature of all potential samples. This situation is called overfitting. Such a model will have a large error when facing a new sample. Professional expression The generalization performance of the model is reduced. In contrast, underfitting, the model does not learn the general properties of the sample, this situation is generally easier to solve, expand the data set or adjust the model.

3. Data preprocessing

The reason for data preprocessing is that the data sets provided are rarely used directly. For example: if the sample has too many attributes, there are generally two methods: dimensionality reduction and feature selection. Feature selection is relatively easy to understand, that is, to choose useful and relevant attributes, or to use another way of expression: to select the features in the sample that are useful and related to the problem.

4. Selected model

When the data set is perfect, the next step is to select the appropriate model based on the specific problem. One way is to consider whether there are labeled samples. If it is a labeled sample, you can consider supervised learning, otherwise it is unsupervised learning, and both will depend on whether semi-supervised learning comes in handy.

5. Training and optimization

With the model selected, how to train and optimize is also an important issue. If you want to evaluate the division effect of the training set and the verification set, the common methods are set aside, cross-validation, self-help method, and model adjustment. If the model calculation time is too long, you can consider pruning. Introduce regularization terms to suppress (compensation principle) If a single model is not effective, you can integrate multiple learners through a certain strategy to combine strengths and weaknesses (integrated learning)

6. Machine learning example analysis

In the field of machine learning, features are more important than models (learning algorithms). For example, if our features are well selected, we may be able to use simple rules to determine the final result, even without a model. For example, to determine whether a person is a man or a woman, suppose the computer does this task. First, various data are collected (features: name, height, hair length, birth place, smoking, etc .. Because according to statistics, we know that men are generally Women are taller, their hair is shorter than women, and they can smoke; so these features have a certain degree of discrimination, but there are always counterexamples. We may use the best algorithm and the accuracy rate may not reach 100%. Then perform feature extraction, Propose features that are meaningful to the target, delete irrelevant (hometown), and then pre-process, re-process the feature extraction results, the purpose is to enhance the representation ability of the features, to prevent the model (classifier) ​​from being too complicated and difficult to learn. Next It is the training data. Here we use supervised learning or unsupervised methods to fit the classifier model. The learner tries to fit the function between these data and the learning target by analyzing the rules of the data, so that the overall error defined on the training set As small as possible, so that the learned function can be used to predict the unknown data. As a result, the final results of the evaluation and improvement.

Figure 23 An example of the machine learning process

(7) Scope covered by machine learning

In terms of scope, machine learning is similar to pattern recognition, statistical learning, and data mining. At the same time, the combination of machine learning and processing technologies in other fields has formed interdisciplinary subjects such as computer vision, speech recognition, and natural language processing. Therefore, when it comes to data mining, it can be equated to machine learning. At the same time, what we usually say about machine learning applications should be general, not just limited to structured data, but also applications such as images and audio.

(1) Pattern recognition

Pattern recognition = machine learning. The main difference between the two is that the former is a concept developed from industry, and the latter is mainly derived from computer science. In the famous "Pattern Recognition And Machine Learning" book, Christopher M. Bishop said at the beginning: "Pattern recognition comes from industry, and machine learning comes from computer science. However, the activities in them can be regarded as two aspects of the same field. At the same time, in the past ten years, they Have made great progress. "

(2) Data mining

Data mining = machine learning + database. Data mining is just a way, not all data has value, data mining thinking is the key, plus a deep understanding of data, so that it is possible to derive patterns from the data to guide business improvement. Most of the algorithms in data mining are the optimization of machine learning algorithms in the database.

(3) Statistical learning

Statistical learning is approximately equal to machine learning. Statistical learning is a highly overlapping discipline with machine learning. Because most methods in machine learning come from statistics, it can even be argued that the development of statistics promotes the prosperity of machine learning. For example, the famous support vector machine algorithm is derived from the statistics department. The difference between the two is that statistical learners focus on the development and optimization of statistical models, partial mathematics, while machine learners focus more on problem solving and partial practice, so machine learning researchers will focus on learning algorithms in computers Improve the efficiency and accuracy of the implementation.

(4) Computer vision

Computer vision = image processing + machine learning. Image processing technology is used to process the image into an input suitable for entering the machine model, and machine learning is responsible for identifying relevant patterns from the image. There are many applications related to computer vision, such as Baidu image recognition, handwritten character recognition, license plate recognition and so on. This field will be a popular direction for future research. With the development of deep learning in the new field of machine learning, the effect of computer image recognition has been greatly promoted, so the future development prospect of the computer vision industry is inestimable.

(5) Voice recognition

Speech recognition = speech processing + machine learning. Speech recognition is a combination of audio processing technology and machine learning. Speech recognition technology is generally not used alone, and is generally combined with related technologies of natural language processing. Current related applications include Apple's voice assistant siri.

(6) Natural language processing

Natural language processing = text processing + machine learning. Natural language processing technology is mainly a field for machines to understand human language. In natural language processing technology, a large number of technologies related to compilation principles are used, such as lexical analysis, grammatical analysis, etc. In addition, at the level of understanding, technologies such as semantic understanding and machine learning are used. As the only symbol created by humans themselves, natural language processing has been the direction of continuous research in the machine learning world.

Figure 24 Scope of machine learning

(8) Main application scenarios of machine learning in industrial production

Machine learning, as the most effective implementation method of artificial intelligence, has been widely used in many scenarios such as industrial manufacturing. The following are five application scenarios of machine learning in industrial production.

1. Replacing visual inspection operations, realizing intelligent and unmanned manufacturing inspection

For example, the classification of engineering rock mass is mainly judged by experienced engineers through careful identification. The efficiency is relatively low, and different judgment deviations will occur due to different people. By adopting artificial intelligence, the experience of engineers is converted into deep learning algorithms. The accuracy of judgment is comparable to that of human judgment. After obtaining the corresponding weights, the APP is developed. In this way, after taking pictures with the tablet, the engineer can automatically obtain the project through the APP. The result of rock mass classification is highly efficient and accurate.

2. Significantly improve the operating performance of industrial robots, enhance the automation and unmanned manufacturing process

There are many operations in the industry that need to be sorted. If manual operations are used, the speed is slow and the cost is high, and it is necessary to provide a suitable working temperature environment.如果采用工业机器人的话,可以大幅减低成本,提高速度。例如图25所示的Bin Picking机器人。

图25 Bin Picking(零件分检)机器人

但是,一般需要分捡的零件并没有被整齐摆放,机器人虽然有摄像机看到零件,但却不知道如何把零件成功的捡起来。在这种情况下,利用机器学习,先让工业机器人随机的进行一次分捡动作,然后告诉它这次动作是成功分捡到零件还是抓空了,经过多次训练之后,机器人就会知道按照怎样的顺序来分捡才有更高的成功率,如图26所示。

图26 利用机器学习进行散堆拾取

如图27所示,经过机器学习后,机器人知道了分捡时夹圆柱的哪个位置会有更高的捡起成功率。

图27 学习次数越多准确性越高

如图28表明通过机器学习后,机器人知道按照怎样的顺序分捡,成功率会更高,图中数字是分捡的先后次序。

图28 机器人确定分拣顺序

如图29所示,经过8个小时的学习后,机器人的分捡成功率可以达到90%,和熟练工人的水平相当。

图29 分捡成功率得到大幅提升

3.工业机器人异常的提前检知,从而有效避免机器故障带来的损失和影响

在制造流水线上,有大量的工业机器人。如果其中一个机器人出现了故障,当人感知到这个故障时,可能已经造成大量的不合格品,从而带来不小的损失。如果能在故障发生以前就检知的话,就可以有效得做出预防,减少损失。如图30中的工业机器人减速机,如果给它们配上传感器,并提前提取它们正常/不正常工作时的波形,电流等信息,用于训练机器学习系统,那么训练出来的模型就可以用来提前预警,实际数据表明,机器人会比人更早地预知到故障,从而降低损失。

图30 工业机器人故障预测

如图9所示,经过机器学习后,模型通过观测到的波形,可以检知到人很难感知到的细微变化,并在机器人彻底故障之前的数星期,就提出有效的预警。图31是利用机器学习来提前预警主轴的故障,一般情况下都是主轴出现问题后才被发现。

图31 主轴故障预测

4. PCB电路板的辅助设计

任何一块印制板,都存在与其他结构件配合装配的问题,所以印制板的外形和尺寸必须以产品整机结构为依据,另外还需要考虑到生产工艺,层数方面也需要根据电路性能要求、板型尺寸和线路的密集程度而定。如果不是经验丰富的技术人员,很难设计出合适的多层板。利用机器学习,系统可以将技术人员的经验转化为模型,从而提升PCB设计的效率与成功率,如图32所示。

图32 PCB板辅助设计

5.快速高效地找出符合3D模型的现实零件

例如工业上的3D模型设计完成后,需要根据3D模型中参数,寻找可对应的现实中的零件,用于制造实际的产品。利用机器学习来完成这个任务的话,可以快速,高匹配率地找出符合3D模型参数的那些现实零件。

图33是根据3D模型设计的参数,机器学习模型计算各个现实零件与这些参数的类似度,从而筛选出匹配的现实零件。没有使用机器学习时,筛选的匹配率大概是68%,也就是说,找出的现实零件中有1/3不能满足3D模型设计的参数,而使用机器学习后,匹配率高达96%。

图33 检索匹配的零件

(九)机器学习中的日常生活场景

1.市民出行选乘公交预测

基于海量公交数据记录,希望挖掘市民在公共交通中的行为模式。以市民出行公交线路选乘预测为方向,期望通过分析公交线路的历史公交卡交易数据,挖掘固定人群在公共交通中的行为模式,分析推测乘客的出行习惯和偏好,从而建立模型预测人们在未来一周内将会搭乘哪些公交线路,为广大乘客提供信息对称、安全舒适的出行环境,用数据引领未来城市智慧出行。

2.商品图片分类

电商网站含有数以百万计的商品图片,“拍照购”“找同款”等应用必须对用户提供的商品图片进行分类。同时,提取商品图像特征,可以提供给推荐、广告等系统,提高推荐/广告的效果。希望通过对图像数据进行学习,以达到对图像进行分类划分的目的。

3.基于文本内容的垃圾短信识别

垃圾短信已日益成为困扰运营商和手机用户的难题,严重影响到人们正常生活、侵害到运营商的社会形象以及危害着社会稳定。而不法分子运用科技手段不断更新垃圾短信形式且传播途径非常广泛,传统的基于策略、关键词等过滤的效果有限,很多垃圾短信“逃脱”过滤,继续到达手机终端。希望基于短信文本内容,结合机器学习算法、大数据分析挖掘来智能地识别垃圾短信及其变种。

4.国家电网客户用电异常行为分析

随着电力系统升级,智能电力设备的普及,国家电网公司可以实时收集海量的用户用电行为数据、电力设备监测数据,因此,国家电网公司希望通过大数据分析技术,科学的开展防窃电监测分析,以提高反窃电工作效率,降低窃电行为分析的时间及成本。

5.自动驾驶场景中的交通标志检测

在自动驾驶场景中,交通标志的检测和识别对行车周围环境的理解起着至关重要的作用。例如通过检测识别限速标志来控制当前车辆的速度等;另一方面,将交通标志嵌入到高精度地图中,对定位导航也起到关键的辅助作用。希望机遇完全真实场景下的图片数据用于训练和测试,训练能够实际应用在自动驾驶中的识别模型。

6.大数据精准营销中用户画像挖掘

在现代广告投放系统中,多层级成体系的用户画像构建算法是实现精准广告投放的基础技术之一。期望基于用户历史一个月的查询词与用户的人口属性标签(包括性别、年龄、学历)做为训练数据,通过机器学习、数据挖掘技术构建分类算法来对新增用户的人口属性进行判定。

7.监控场景下的行人精细化识别

随着平安中国、平安城市的提出,视频监控被广泛应用于各种领域,这给维护社会治安带来了便捷;但同时也带来了一个问题,即海量的视频监控流使得发生突发事故后,需要耗费大量的人力物力去搜索有效信息。希望基于监控场景下多张带有标注信息的行人图像,在定位(头部、上身、下身、脚、帽子、包)的基础上研究行人精细化识别算法,自动识别出行人图像中行人的属性特征。

8.需求预测与仓储规划方案

拥有海量的买家和卖家交易数据的情况下,利用数据挖掘技术,我们能对未来的商品需求量进行准确地预测,从而帮助商家自动化很多供应链过程中的决策。这些以大数据驱动的供应链能够帮助商家大幅降低运营成本,更精确的需求预测,能够大大地优化运营成本,降低收货时效,提升整个社会的供应链物流效率,朝智能化的供应链平台方向更加迈进一步。高质量的商品需求预测是供应链管理的基础和核心功能。

9.股价走势预测

随着经济社会的发展,以及人们投资意识的增强,人们越来越多的参与到股票市场的经济活动中,股票投资也已经成为人们生活的一个重要组成部分。然而在股票市场中,众多的指标、众多的信息,很难找出对股价更为关键的因素;其次股市结构极为复杂,影响因素具有多样性、相关性。这导致了很难找出股市内在的模式。希望在尽可能全面的收集股市信息的基础上,建立股价预测模。

10.地震预报

根据历史全球大地震的时空图,找出与中国大陆大地震有关的14个相关区,对这些相关区逐一鉴别,选取较优的9个,再根据这9个相关区发生的大震来预测中国大陆在未来一年内会不会有大震发生。

11.穿衣搭配推荐

穿衣搭配是服饰鞋包导购中非常重要的课题,基于搭配专家和达人生成的搭配组合数据,百万级别的商品的文本和图像数据,以及用户的行为数据。期待能从以上行为、文本和图像数据中挖掘穿衣搭配模型,为用户提供个性化、优质的、专业的穿衣搭配方案,预测给定商品的搭配商品集合。

12.依据用户轨迹的商户精准营销

随着用户访问移动互联网的与日俱增,如何根据用户的画像对用户进行精准营销成为了很多互联网和非互联网企业的新发展方向。希望根据商户位置及分类数据、用户标签画像数据提取用户标签和商户分类的关联关系,然后根据用户在某一段时间内的位置数据,判断用户进入该商户地位范围300米内,则对用户推送符合该用户画像的商户位置和其他优惠信息。

13.气象关联分析

在社会经济生活中,不少行业,如农业、交通业、建筑业、旅游业、销售业、保险业等,无一例外与天气的变化息息相关。为了更深入地挖掘气象资源的价值,希望基于共计60年的中国地面历史气象数据,推动气象数据与其他各行各业数据的有效结合,寻求气象要素之间、以及气象与其它事物之间的相互关系,让气象数据发挥更多元化的价值。

14.交通事故成因分析

随着时代发展,便捷交通对社会产生巨大贡献的同时,各类交通事故也严重地影响了人们生命财产安全和社会经济发展。希望通过对事故类型、事故人员、事故车辆、事故天气、驾照信息、驾驶人员犯罪记录数据以及其他和交通事故有关的数据进行深度挖掘,形成交通事故成因分析方案。

15.基于兴趣的实时新闻推荐

随着近年来互联网的飞速发展,个性化推荐已成为各大主流网站的一项必不可少服务。提供各类新闻的门户网站是互联网上的传统服务,但是与当今蓬勃发展的电子商务网站相比,新闻的个性化推荐服务水平仍存在较大差距。希望通过对带有时间标记的用户浏览行为和新闻文本内容进行分析,挖掘用户的新闻浏览模式和变化规律,设计及时准确的推荐系统预测用户未来可能感兴趣的新闻。

四、深度学习:机器学习的更高智能进阶

1.深度学习的背景

2006年,加拿大多伦多大学教授、机器学习领域的泰斗Geoffrey Hinton和学生Salakhutdinov在Science上发表文章《Reducing the Dimensionalitg of Data with Neural Neworks》,这篇文章有两个主要观点:1)多隐层神经网络有更厉害的学习能力,可以表达更多特征来描述对象;2)训练深度神经网络时,可通过降维(pre-training)来实现,老教授设计出来的Autoencoder网络能够快速找到好的全局最优点,采用无监督的方法先分开对每层网络进行训练,然后再来微调。该文章的发表翻开了深度学习的新篇章。2013年4月,深度学习技术被《麻省理工学院技术评论》(MIT TechnologyReview)杂志列为2013年十大突破性技术(Breakthrough Technology) 之首。与浅层学习模型依赖人工经验不同,深层学习模型通过构建机器学习模型和海量的训练数据,来学习更有用的特征,从而最终提升分类或预测的准确性。

图34 深度学习的发展历程

2.深度学习的定义

深度学习是机器学习研究领域的分支,隶属于神经网络体系。深度学习通过建立、模拟人脑的信息处理神经结构来实现对外部输入的数据进行从低级到高级的特征提取,从而能够使机器理解学习数据,获得信息,因具有多个隐藏层的神经网络又被称为深度神经网络。深度学习将数据输入系统后,通过建模及模拟人脑的神经网从而进行学习的技术,像生物神经元一样,神经网络系统中有系列分层排列的模拟神经元(信息传递的连接点),且经过每个神经元的响应函数都会分配一个相应的“权值”,表示彼此之间的连接强度。通过每层神经元相互“连接”,计算机就可以由达到最佳方案时所有神经元的加权和,从而可以实现这一决策方案。

3.深度学习的基础和实现

①深度学习的思想基础一误差逆传播算法(BP算法)

BP神经网络(如图35) 是1986年Rumelhart和McClelland等人提出的,是一种按误差逆传播算法训练的多层前馈神经网络,它存储大量映射模式关系,无需揭示其映射方程。BP算法的核心思想是采用最速下降法(梯度下降法),通过反向传播调试网络的权值和阈值,使得其误差平方和最小。

图35 BP神经网络

②图像处理领域的里程碑一卷积神经网络(CNN)

20世纪60年代,Hubel和Wiesel在研究猫脑皮层中用于局部敏感和方向选择的神经元时发现网络结构可以降低反馈神经网络的复杂性,进而提出了卷积神经网络的概念。由于其避免了对图像的前期预处理,可以直接输入原始图像,CNN已经成为神经网络的标志性代表之一。

图36 卷积神经网络(CNN)

③深度神经网络的实现基础一玻尔兹曼机和受限玻尔兹曼机

玻尔兹曼机是Hinton和Sejnowski提出的随机递归神经网络,也可以看做是随机的Hopfield网络,因样本分布遵循玻尔兹曼分布而命名为BM。

图37 玻尔兹曼机

4.深度学习的重大成就

利用机器学习,人工智能系统获得了归纳推理和决策能力;而深度学习更将这一能力推向了更高的层次。目前,在深度学习中,卷积神经网络(Convolutional Neural Network,简称CNN)作为最有效的深层神经网络,现在已经被越来越多地应用到许多智能领域之中,并且它们越来越像人类了,例如AlphaGo、SIRI和FACEBOOK等都应用了卷积神经网络。在中国目前非常关注的智能制造领域中,制造机器人是深度学习的经典案例,深度学习的机器人能够自动适应外部环境的变化,面对新型任务时可以自动重新调整算法和技术,

5.深度学习的发展展望

深度学习必将成为人工智能发展的核心驱动力。虽然深度学习在实际应用中取得了许多成就,但是仍有局限性:理论研究缺乏、无监督学习能力弱、缺少逻辑推理和记忆能力等。深度学习的研究多是基于实验训练进行的,但是对其内部原理,学习本质研究很少。现在的研究多是在网络架构、参数选择等方面,而且深度学习的还有进一步提升空间,也需要更加完备深入的理论支撑其发展。

目前主流应用还是以监督学习为主的,但在实际生活中,无标签未知的数据占主体,所以更应该应用可以发现事物内在关系的无监督学习,未来还有更广阔的发展空间。深度学习是人工智能发展的巨大推力,目前阶段中深度学习更侧重于处理数据,在面对更复杂的任务时,则需要更多记忆能力和逻辑推理能力。

五:机器学习的未来:挑战与机遇并存

机器学习是人工智能应用的又一重要研究领域。当今,尽管在机器学习领域已经取得重大技术进展,但就目前机器学习发展现状而言,自主学习能力还十分有限,还不具备类似人那样的学习能力,同时机器学习的发展也面临着巨大的挑战,诸如泛化能力、速度、可理解性以及数据利用能力等技术性难关必须克服。但令人可喜的是,在某些复杂的类人神经分析算法的开发领域,计算机专家已经取得了很大进展,人们已经可以开发出许多自主性的算法和模型让机器展现出高效的学习能力。对机器学习的进一步深入研究,势必推动人工智能技术的深化应用与发展。

UV Fused Silica Window

Quartz windows are usually cut and ground from quartz glass, and their silica content can reach more than 99.99%. The hardness is seven grades, and it has the characteristics of high temperature resistance, low thermal expansion coefficient, thermal shock resistance and good electrical insulation performance. The visible light transmittance of the quartz window is over 85%. Quartz glass can be divided into two categories, fused silica glass and synthetic silica glass, from the large production range. Heat resistance, light transmittance, electrical insulation, and chemical stability are all very good.
Due to the various characteristics of quartz glass, it is shown that it is an indispensable and excellent material in modern cutting-edge technology, atomic energy industry, automation system, as well as semiconductor, metallurgy, chemical industry, communications, lasers, optical instruments, laboratory instruments, medical equipment and national defense industries. One, the application is very wide.

Coating Window,Protective Windows,Wedge Window,Uv Fused Silica Window

Bohr Optics Co.,Ltd , https://www.bohr-optics.com