What are the core issues that need to be addressed to build an unmanned retail store like Amazon Go?

It has been four months since the launch of Amazon Go. During this period, many traditional retailers and entrepreneurs in China have been affected, and they have begun testing water to build an unretailed retail store. However, due to lack of familiarity with technical and engineering details, it is often impossible to start.

In addition to AI technology such as computer vision, no-person retail involves large-scale sensors and smart devices, which is a complex and huge project. Lei Feng Network has learned that although some people have already shared the principle of reading unmanned retail stores, most of them only describe rough implementations and do not go into details.

For this purpose, Chen Weilong (WeChat: daoyuan3), an uninitiated retail store entrepreneur, helps people to answer questions by telling us about an unmanned retail store project based on Amazon Go principles.

First, the core issues of unmanned retail

Chen Weilong mentioned that the core of an unmanned retail store like Amazon Go is to solve the problem of â€œwhat product is handled by whom?â€ There are five factors to be dealt with: person, personâ€™s location, merchandise, merchandise Location, action.

Actions are mainly identified by the status of the merchandise and hand or shelf, such as hand gestures to enter or exit the shelf, and the article identifies the state of the hand to retrieve or replace the merchandise.

Product identification is mainly through the selection of the initial state, the middle state through the shopping list check to narrow the identification range, reduce the difficulty, while ensuring that the initial state is not destroyed by employees.

The location is mainly through mobile phone positioning, sensor positioning and image positioning. Using human posture recognition can well locate the action to people.

The convenience of Amazon Go comes from each purchase behavior that the right to check merchandise and cash register is issued from the cashier to each customer. The work of the cashier is centralized serial processing for the customerâ€”all customers must go to a designated place to confirm the shopping behavior at one time and wait for the cashier to be idle.

Amazon Go is a distributed parallel process for customers. Each time the customer's shopping behavior is processed and recorded by the system, each user does not occupy system time.

Second, "how to deal with" how to define and show?

Chen Weilong gave an example of non-moving action: For supermarkets, there are two states of goods - sold or not sold; for the shelf, the state of the goods is or not; for the customer, the state of the goods Is to buy or not to buy; For the human hand, the state of the goods is in hand or not in hand, further simplification to pick up or put back.

Their relationship chain is as follows:

If a customer wants to buy a product, he must go through the state where the product is picked up and in hand. Correspondingly, the goods on the shelf are not in the rack.

In the same way, customers who do not buy commodities must go through the process of returning products and not being in the hands. Correspondingly, the goods on the shelf are in the rack.

There is also a situation in which customers do not buy commodities and do not experience stages such as picking up and returning. This situation does not need to be dealt with because nothing happens.

How do you express or measure these two states? Both the camera and the sensor can indicate this state.

1. How to handle the camera

"After the hand enters the shelf to pick up the goods and after taking the goods, the opponent takes a set of pictures, which are recorded as the first picture and the second picture. One set of pictures is taken on the shelf, and the third picture and the fourth picture are recorded. The difference between a picture and a second picture needs to use the CV algorithm to identify the skin color to find the hand, so as to know the difference between the first and second picture gestures, to recognize the gripping posture and the stretching posture, according to two gestures in the first and second The order in which the pictures appear can be judged whether they are taken or returned. For example, the first picture is the grip position, and the second picture is the stretch, that is, put back; the first picture is the stretch, and the second picture is the grip position, that is, it is taken.â€ The camera's processing flow has been elaborated very carefully.

Then he continued to explain: After finding the hand using skin color, identify the color difference between the first and second picture hand edges to identify whether the product is in hand, according to the order can be judged to get or put back. For example, the color difference between the first picture and the hand edge is light, indicating that there is no product, and the color difference between the second picture and the hand edge is dark, indicating that there is a product, that is, the product is in the hand and is in the picking process. Put it back in the same way.

Using the third and fourth pictures, the shelves can be processed and the same can be judged whether they are taken or returned. For example, if the fourth picture has one or more products more than the third picture, then it is put back; if the fourth picture has one or more products less than the third picture, it is taken.

How to deal with the sensor

For gravity sensors, goods are taken, the goods are reduced, and the weight is reduced; the goods are returned, the goods are increased, and the weight is increased. So the change in weight value can mean take or put back.

For infrared, in certain places, goods are put back, infrared is blocked, goods are taken, and infrared is not blocked. Can be taken or replaced by the infrared blocking state.

Third, how to effectively identify products?

The identification of goods should be one of the most difficult key points.

In general, there are 1000 to 100,000 different types of supermarket merchandise, and it is almost impossible to identify so many kinds of merchandise in a real and potentially damaged environment.

Chen Weilong pointed out that in the initial state, certain categories of goods were placed in specific locations and detected by cameras and sensors. For cameras and sensors, they only need to recognize a small number of specific product categories and quantities. This is relatively simple. Even if the picture cannot be identified, it can be identified and screened according to weight. When the category is placed, it is possible to select the easily distinguishable category to be placed together, and all the picked products are recorded in the customer's shopping list.

Difficulties in the middle state. As the customer's return will destroy the initial state, the difficulty of recognition will rise sharply. Because the customer may put back any merchandise, the product's identification scope is extended to an unsolvable situation.

First discuss the final state after replacement, generally divided into two kinds: put back right or wrong.

For the correct return, the level of difficulty is identified in the initial state.

There are 3 cases for return error: put back the error but recognize it; replace the error cannot be recognized; replace the error and identify the error.

Back to the wrong but recognizable situation is because the product itself is easily identified by the image and weight, and this is less likely. In most cases, the error is returned and the error cannot be identified or identified. Back to the wrong and unrecognized situation can send a message to the user for the user to confirm. Identifying the wrong situation can only improve the accuracy of the algorithm, and at the same time adjust the judgment limit value. It can not recognize and replace part of the error recognition situation, and at the same time promptly notify the employee to organize and return to the initial state.

In general, there are many situations in which goods are returned after being picked up, and there are also many cases where they are not put back in the right place. Among them, mistakes are returned and unrecognized or identified mistakes are accounted for in the majority. Replacing mistakes but recognizing them is a minority. In general, the proportion of unrecognized and misjudgment cannot be ignored, and even the entire system cannot work.

Just now, due to the return of goods by customers, the scope of identification has been extended to all products, which can be partially solved. Since each return of a product is based on the customer's first N acquisitions, the product returned must be the previously acquired product, so the customer's shopping list can be identified preferentially. In this case, the recognition difficulty is Return to the initial level of considerable level.

Chen Weilong summed up:

In the initial state (acquisition), the product's identification range is a specific minority category, probably no more than five, and it is relatively easy to identify by picture and weight screening. Through the selection and placement of previous categories, the initial state can be more easily identified. In the intermediate state (replacement), the index can be narrowed to a specific item on the list by indexing the customer's shopping list, and the recognition difficulty is not increased sharply in the initial state and is within the range that can be handled. In the final state, due to the inaccuracy of putting back the recognition, the final state will deviate from the initial state, and each return will lead to the deviation from the initial state and eventually lead to collapse. However, because the difficulty of returning to identification is not very great, such deviations are less likely to appear, or acceptable, and will not lead to long-term deviation from the final collapse. Coupled with the timely notification of employees to organize, you can correct the deviation.

Fourth, how to know the identity of the initiator of the action?

Documenting the product to the initiator of the action is a more complex project.

According to Chen Weilong, there are generally two ways to identify people: physical features or incidental objects such as faces and mobile phones. The accuracy of face recognition is still within an acceptable range and can almost be used as a unique identifier. The mobile phone is also a unique identifier of a person, who can determine who the customer is by determining who owns the phone.

The "by whom" implies a factor: location. To confirm who took the goods or Xiao Zhang was beaten by Chen, the premise is that the location of the goods and the customer should match.

Fifth, how to match customer ID and product ID?

To match the IDs of customers and products, we must first determine the positioning of people and products.

Regarding the positioning of people, tracking systems can be used. The GPS, wifi, and Bluetooth of the mobile phone can also provide accurate and rich location information.

Regarding the positioning of products, the initial problems of infrared, weight sensing, cameras, goods, and shelves are all known and can be inferred. For example, through partitions, similar products can be divided into different grid types. Each grid can have a different infrared or weight sensor, and can know where the product is taken or returned.

Through the matching of the positions of people and commodities, the two factors of "what product" and "who is" are connected.

Because of cost and technical issues, location accuracy is a big problem. Coupled with the flaws in the solution of locating people and commodities, the matching error between customers and products is large. For example, customer A stands in front of product A, customer B stands in front of product B, and customer A reaches for product B. This situation cannot be accurately judged by the system. Of course it can be confirmed by the customer, but this is only a weak remedy.

It has been previously mentioned that Amazon Go may use a multi-angled, complete human posture recognition to locate the matching person-movement-commodity relationship. The key to this approach is that the camera needs a good view and a sufficient number of cameras. From the promotional video point of view, Amazon Go's shelf design makes it impossible to obtain a good enough field of vision in the lowest and middle layers. The possible solution is to rely on the camera on the opposite shelf and ceiling. Amazon Go shelf structure is very important, as long as the provision of a shelf structure map or physical map can further speculate on the implementation plan. The shelf structure includes whether each floor has an imaginary head, the shape and size of the load-bearing surface that carries the goods, special openings and screw positions, and the like.

to sum up

Through the above methods, difficulty is not difficult to imagine, but the amount of work is not small. Even if the action and product identification can reach 100%, because of the positioning scheme and the accuracy problem, the overall identification has a certain error, making the entire program unusable, or can only partially rely on customer assistance to achieve. So Amazon Go and other uninformed retail outlets in the future can only be used for specific groups of people in a small part, such as members with higher credit.

Chen Weilong divided the no-person cash register into three phases: the statistical phase of commodity and customer behavior data, the phase of no-person cash register to identify normal shopping, and the phase of no-pay cash withdrawal to identify cheating.

The first stage is relatively easy to implement, because just statistics, the customer will not be cheated, and the statistical error range is relatively large compared to no one.

In the second stage, when customers do not cheat, they record 100% of customer and product data and achieve the effect of no-person cash register. Amazon Go is currently at this stage. The obvious feature of this stage is the establishment of a high-quality member population.

The third stage is to identify any cheating and accurate statistics. At this stage, the cheating behavior of the supermarket or the retail industry can be seen and eliminated. It can not only achieve no cash, but also completely solve the problem of theft.

Due to cost and technical problems, Chen Weilong thinks that the second stage will be reached and it will be close to the third stage. For example, to identify normal shopping and behavior, special cases can be marked directly by manual cash register. On this basis, expand the scope of normal shopping and behavior, narrow the scope of special situations, make it more friendly and smart to customers.

UK Wall Socket

Wenzhou Niuniu Electric Co., Ltd. , https://www.anmuxisocket.com